Exploring the
Impact of
COVID-19 on
Population
Demographic
GROUP-25
Vikram Kotlo 11608418
vikramkotlo@my.unt.edu
Introduction
Domain: Health and Demographics Analysis
This domain involves analyzing health-related datasets and
demographical data to understand patterns, trends, and factors
influencing public health outcomes.
The objective is to gain insights into various aspects of health,
including life expectancy, mortality rates, disease prevalence,
and the impact of epidemics such as COVID-19.
By looking in to these datasets, we aim to identify correlations,
disparities, and potential interventions to improve global health
outcomes.
Below
Workflow
Explaination
Step 1: Data Collection (Data Gathering):
Obtain relevant datasets from sources like Kaggle, which provide comprehensive health and epidemiological
data.
Step 2: Initial Visualization (Exploratory Data Analysis):
Use tools like D3.js to create initial visualizations, providing an overview of data structure and potential
insights.
Step 3: Data Cleaning (Data Preprocessing):
Employ Python libraries like Pandas and NumPy to clean the datasets, handling missing values, outliers, and
inconsistencies.
Step 4: Refined Visualization (Advanced Visualizations):
Utilize Python libraries such as Matplotlib and Seaborn to create refined visualizations, focusing on specific
variables of interest and uncovering deeper insights.
Step 5: Dashboard Creation (Interactive Reporting):
Utilize platforms like Microsoft Power BI to design interactive dashboards and reports, integrating the refined
visualizations to provide a comprehensive view of the analyzed data.
Step 6: Report Generation (Insights and Conclusions):
Create detailed reports summarizing the analysis findings, insights, and conclusions drawn from the
visualizations and data analysis process, facilitating informed decision-making in public health policy and
interventions.
Data Abstraction-Dataset (Type and
Attributes)
World Data:
Type: CSV
Attributes:
Country (object): Name of the country.
Year (int64): Year of observation.
Status (object): Health status classification.
Life expectancy (float64): Average life expectancy in years.
Adult Mortality (int64): Mortality rate of adults per 1000 population.
Infant deaths (int64): Number of infant deaths per 1000 live births.
Alcohol (float64): Alcohol consumption in liters per capita.
Percentage expenditure (float64): Health expenditure as a percentage of GDP.
Hepatitis B (int64): Percentage of population vaccinated for Hepatitis B.
Measles (int64): Number of reported measles cases per 1000 population.
BMI (float64): Average Body Mass Index.
Population (float64): Population of the country.
COVID-19 Data:
Type: .CSV
Attributes:
Country/Region (object): Name of the country or region.
Confirmed (int64): Total confirmed COVID-19 cases.
Deaths (int64): Total deaths due to COVID-19.
Recovered (int64): Total recoveries from COVID-19.
Active (int64): Active cases of COVID-19.
New cases (int64): New confirmed cases reported.
New deaths (int64): New deaths reported.
New recovered (int64): New recoveries reported.
Deaths / 100 Cases (float64): Percentage of deaths among confirmed cases.
Recovered / 100 Cases (float64): Percentage of recoveries among confirmed
cases.
Data Transformation
Data Transformation Techniques:
Handling missing values, scaling numerical features, encoding categorical features, and
normalizing data.
Handling Missing Values:
Use techniques like isnull().sum() to address missing values in both datasets.
Scaling Numerical Features:
Scale numerical features to a similar range to prevent biases in models.
Encoding Categorical Features:
Encode categorical features into numerical values using techniques like label encoding.
Normalizing Data:
Normalize numerical features to have a mean of 0 and a standard deviation of 1 to improve
model convergence and performance.
Task
Abstraction
Task: Analyzing Health and Demogrphical Data
Target:
Gain insights into global health trends, identify factors influencing health
outcomes, and assess the impact of epidemics like COVID-19.
Actions:
Investigate the relationship between various factors (e.g., GDP, healthcare
expenditure) and life expectancy to understand health outcomes.
Explore adult mortality and infant mortality rates to identify regions with
high mortality rates and potential interventions.
Analyze data on diseases such as measles and hepatitis B to assess disease
prevalence and vaccination coverage.
Evaluate the spread and impact of COVID-19, including confirmed cases,
deaths, and recoveries, to inform public health responses.
Investigate the influence of socioeconomic factors like GDP, education, and
income composition on health outcomes.
Task Abstraction Workflow Diagram
Analyze Life Expectancy Trends
Explore the relationship between life expectancy and factors like GDP, healthcare expenditure, and vaccination
coverage.
Assess Mortality Rates
Investigate adult mortality and infant mortality rates to identify regions with high mortality rates and potential
interventions.
Study Disease Incidence
Analyze data on diseases such as measles and hepatitis B to assess disease prevalence and vaccination coverage.
Investigate COVID-19 Impact
Evaluate the spread and impact of COVID-19, including confirmed cases, deaths, and recoveries, to inform
public health responses.
Examine Socioeconomic Factors
Investigate the influence of socioeconomic factors like GDP, education, and income composition on health
outcomes.
Implementation
using Tools:
Tools Used for Visualization:
D3.js:
Utilized for initial visualization of uncleaned dataset, providing interactive and dynamic visualizations
on webpages(vizhub).
Python (Pandas, NumPy, Matplotlib, Seaborn):
Pandas and NumPy for data manipulation and preprocessing.
Matplotlib and Seaborn for creating static visualizations such as line plots, bar charts, and histograms.
Microsoft Power BI:
Used for creating interactive dashboards and reports, integrating refined visualizations to provide a
comprehensive view of the analyzed data.
Explanation:
D3.js enabled the creation of dynamic and interactive visualizations for exploring the uncleaned dataset,
providing an initial overview of the data's structure.
Python, with libraries like Pandas and NumPy, was primarily used for data cleaning, preprocessing, and
generating static visualizations.
Microsoft Power BI was employed for creating interactive dashboards and reports, allowing for in-
depth exploration and analysis of the refined visualizations.
Results for
Analysis
Initial Visualization Using D3.js Pie Chart
of COVID-19 Cases by WHO Region(pre-
cleaning)
Explanation:
The pie chart provides a clear overview of how COVID-19 cases are
distributed among different WHO regions.
We can see at a glance which regions have the highest number of
confirmed cases, enabling policymakers and health organizations to
allocate resources effectively.
Regions with larger segments may require more attention in terms of
healthcare infrastructure and preventive measures.
Story:
Imagine global health officials convening to discuss strategies for combating
the spread of COVID-19. As they gather around a screen displaying the pie
chart, they immediately notice the significant portion representing the
European region, indicating a high number of confirmed cases.
This prompts discussions on implementing stricter containment measures
and increasing healthcare capacity in Europe.
Similarly, the smaller segments representing regions with fewer cases spark
conversations about sharing resources and best practices to prevent further
outbreaks.
Stacked Bar Chart of COVID-19 Cases by
Country-Using D3.js
Explanation:
The stacked bar chart allows for a detailed examination of COVID-19 cases within
individual countries. By breaking down each bar into segments for confirmed,
deaths, and recovered cases, viewers can see not only the total number of cases but
also the outcomes of those cases. This provides insights into how countries are
managing the pandemic, including their healthcare systems' capacity to treat
patients and mitigate fatalities.
Story:
As policymakers analyze the stacked bar chart, they focus on the disparities in
outcomes among the top 10 countries with the highest confirmed cases. They notice
that while some countries have a high number of confirmed cases, they've also
managed to achieve a substantial number of recoveries, indicating effective
healthcare interventions. Conversely, countries with a high proportion of deaths
prompt discussions on implementing measures to reduce mortality rates, such as
improving access to critical care facilities and vaccine distribution.
After Cleaning Datasets Using Python And
Visualizing Using Matplotlib: Exploring Global
Health Data
Explanation:
Data Cleaning: Removes extra spaces from column names for
consistency.
Summary Statistics: Computes key statistics for numeric columns.
Visualization: Displays the distribution of life expectancy using a
histogram.
Story:
In a health research project, data cleaning ensures consistency in
column names, aiding smooth analysis.
Summary statistics reveal crucial insights into life expectancy trends
worldwide.
Visualizing life expectancy distributions highlights variations, guiding
targeted interventions for global health improvement.
Understanding Global Population Dynamics: Distribution
Analysis
Explanation:
Data Preparation: Ensures consistency in column names for clarity.
Statistical Insight: Reveals key population statistics for analysis.
Visual Representation: Depicts population distribution via histogram
and KDE.
Story:
In a demographic study, data is prepared by standardizing column
names to streamline analysis. Statistical examination unveils crucial
population trends, essential for informed decision-making. Visualizing
population distributions illuminates demographic disparities, guiding
policy interventions for sustainable development.
Exploring Global Economic Patterns: Distribution of GDP
Explanation:
Data Preparation: Cleans column names, ensuring uniformity.
Statistical Analysis: Computes summary statistics for GDP.
Visualization: Displays GDP distribution via a histogram with
KDE.
Story:
In an economic analysis endeavor, data is prepped by
standardizing column names for clarity. Statistical analysis
uncovers insights into global GDP trends, vital for economic
policy formulation. Visualizing GDP distributions highlights
disparities, informing strategies for balanced economic
development.
Analyzing Numeric Relationships: Correlation Heatmap
Explanation:
Data Filtering: Retains only numeric columns for correlation analysis.
Exploratory Analysis: Examines correlations between numeric
variables.
Visual Insight: Presents correlations via heatmap with annotations.
Story:
In a data exploration endeavor, non-numeric columns are filtered out
to focus on quantitative relationships.
Through exploratory analysis, correlations among numeric variables
are investigated, offering insights into interdependencies.
Visualizing correlations via a heatmap aids in identifying patterns and
guiding further analysis for informed decision-making.
Analyzing COVID-19 Trends: Summary Statistics and
Time Series Plot
Explanation:
Statistical Overview: Generates summary statistics for COVID-19 data.
Temporal Analysis: Plots trends of confirmed cases, deaths, and
recoveries over time.
Story:
In assessing the COVID-19 pandemic, summary statistics offer insights
into the overall magnitude and variability of cases.
Meanwhile, the time series plot depicts the progression of confirmed
cases, deaths, and recoveries over time, enabling the observation of trends
and fluctuations.
Through these analyses, policymakers and health authorities can better
understand the impact of the pandemic and formulate effective response
strategies.
Visualizing COVID-19 Metrics on a World Map
Explanation:
Data Loading: Imports the world map shapefile and COVID-19 data.
Data Merging: Merges COVID-19 data with world map data based on country names.
Map Visualization: Plots COVID-19 metrics (confirmed cases, deaths, recoveries) on the world map.
Story:
In mapping the global impact of COVID-19, the world map shapefile is loaded alongside COVID-19
data.
By merging these datasets, a comprehensive view of the pandemic's spread is achieved, allowing for
spatial analysis.
The resulting visualizations depict the distribution of confirmed cases, deaths, and recoveries across
countries, aiding in understanding regional variations and informing targeted response efforts.
Exploring the Relationship
Between Life Expectancy and
COVID-19 Cases
Explanation:
Data Merging: Merges datasets based on the 'Country/Region' column.
Scatter Plot: Visualizes the relationship between life expectancy and confirmed COVID-19
cases.
Story:
By merging datasets on country/region identifiers, a comprehensive dataset is created for
analysis.
Through a scatter plot, the correlation between life expectancy and confirmed COVID-19
cases is explored, offering insights into potential health disparities and vulnerabilities.
This analysis contributes to understanding how demographic factors may influence the spread
and impact of the pandemic.
Examining Life Expectancy and
COVID-19 Cases Across
Development Status
Explanation:
Data Visualization: Displays a scatter plot of life expectancy against confirmed
COVID-19 cases.
Color Representation: Categorizes data points by development status,
enhancing visual interpretation.
Story:
In this scatter plot analysis, life expectancy is juxtaposed against confirmed
COVID-19 cases, with data points color-coded according to development
status.
The visualization facilitates the identification of potential correlations between
socioeconomic factors and pandemic outcomes.
By examining disparities across development statuses, policymakers can tailor
interventions to address health inequities and mitigate the impact of the
pandemic on vulnerable populations.
Exploring GDP and COVID-19
Cases Across Development Status
Explanation:
Visualization: Presents a scatter plot of GDP against confirmed COVID-19
cases.
Color Coding: Categorizes data points by development status for enhanced
insight.
Story:
This scatter plot delves into the relationship between GDP and confirmed
COVID-19 cases, with data points color-coded based on development status.
By examining how economic factors intersect with pandemic outcomes,
stakeholders can better understand the socioeconomic dimensions of the crisis.
This analysis aids in identifying disparities in vulnerability and resilience
across different economic strata, guiding targeted interventions for pandemic
response and recovery.
Comparing Average GDP
Across WHO Regions
Explanation:
Visualization: Presents a bar plot of average GDP by WHO region.
Data Aggregation: Computes the mean GDP for each WHO region for
comparison.
Story:
This bar plot showcases the average GDP across different WHO regions,
providing insights into regional economic disparities.
By aggregating GDP data, the plot highlights variations in economic
development among WHO regions.
Stakeholders can use this analysis to prioritize resource allocation and
development initiatives, aiming to address economic inequalities and
promote sustainable growth globally.
Comparing Average Life
Expectancy Across WHO Regions
Explanation:
Visualization: Displays a bar plot of average life expectancy by WHO region.
Statistical Summary: Calculates the mean life expectancy for each WHO region for
comparison.
Story:
This bar plot illustrates the average life expectancy across various WHO regions, shedding
light on regional health disparities.
By analyzing average life expectancy data, stakeholders can identify regions with lower life
expectancies and prioritize health interventions accordingly.
This analysis aids in understanding global health outcomes and guiding efforts to improve
population health and well-being across different regions.
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Work Management
Work Completed:
Data Acquisition and Cleaning:
Obtained COVID-19, global population, and economic datasets. Cleaned data to ensure consistency and
accuracy.
Initial Visualization Using D3.js:
Created a pie chart of COVID-19 cases by WHO region using D3.js.
Stacked Bar Chart of COVID-19 Cases:
Developed a stacked bar chart showing COVID-19 cases by country using D3.js.
Cleaning Datasets Using Python and Visualization with Matplotlib:
Cleaned datasets using Python. Visualized global health data with Matplotlib.
Exploring Global Population Dynamics:
Analyzed population distributions and trends.
Exploring Global Economic Patterns:
Examined GDP distributions and correlations with COVID-19 cases.
Understanding Numeric Relationships:
Investigated correlations between variables using a heatmap.
Work Management(Cont.)
Analyzing COVID-19 Trends:
Generated summary statistics and time series plots for COVID-19 data.
Visualizing COVID-19 Metrics on a World Map:
Plotted COVID-19 metrics on a world map.
Exploring Relationships Between Life Expectancy and COVID-19:
Investigated the relationship between life expectancy and COVID-19 cases.
Links
Dashboard Link :
https://app.powerbi.com/groups/32943138-8025-48cd-9a85-
9df66bd1864a/dashboards/fab773de-71a6-474f-9117-19982ed0492f?ctid=70de1992-
07c6-480f-a318-a1afcba03983&pbi_source=linkShare
Vizhub code:
https://vizhub.com/Vikramkotlo09/3e4a96b9055a47c6a872c51257676511?edit
=files&file=index.html
Viz hub Visualization:
https://vizhub.com/Vikramkotlo09/3e4a96b9055a47c6a872c51257676511?mod
e=embed
References:
1. World Health Organization. (2022). COVID-19 Dashboard. [Online]. Available:
https://www.who.int/emergencies/disease-outbreak-news/item/2020-DON233
[Accessed: April 16, 2024].
2. In-Class activities gave good knowledge and reference to complete this project.
3. D3.js Documentation: https://d3js.org/
4. Matplotlib Documentation: https://matplotlib.org/stable/contents.html
5. Pandas Documentation: https://pandas.pydata.org/docs/